-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32295][SQL] Add not null and size > 0 filters before inner explode/inline to benefit from predicate pushdown #29092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Also, I messed up one commit, that made the bot add wrong labels. |
|
ok to test |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
Outdated
Show resolved
Hide resolved
|
Test build #128783 has finished for PR 29092 at commit
|
|
Pinging @viirya and @cloud-fan for possible second review |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #129514 has finished for PR 29092 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #129538 has finished for PR 29092 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
|
Test build #129695 has finished for PR 29092 at commit
|
| InferFiltersFromConstraints) :: | ||
| Batch("Operator Optimization after Inferring Filters", fixedPoint, | ||
| rulesWithoutInferFiltersFromConstraints: _*) :: | ||
| operatorOptimizationRuleSet: _*) :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: operatorOptimizationRuleSet -> rulesWithoutInferFilters?
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
Outdated
Show resolved
Hide resolved
|
Looks okay otherwise. |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #129720 has finished for PR 29092 at commit
|
...t/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromGenerateSuite.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
GA passed. Thanks! Merged to master. |
|
Test build #129738 has finished for PR 29092 at commit
|
| // Exclude child's constraints to guarantee idempotency | ||
| val inferredFilters = ExpressionSet( | ||
| Seq( | ||
| GreaterThan(Size(g.children.head), Literal(0)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be useful to filter rows before join, but can we pushdown Size expression to datasource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so...
What changes were proposed in this pull request?
Add
And(IsNotNull(e), GreaterThan(Size(e), Literal(0)))filter before Explode, PosExplode and Inline, whenouter = false.Removed unused
InferFiltersFromConstraintsfromoperatorOptimizationRuleSetto avoid confusion that happened during the review process.Why are the changes needed?
Predicate pushdown will be able to move this new filter down through joins and into data sources for performance improvement.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test